Part 2. Why the AI Hardcoded the Expected Answers — Comparing Three Models for Code Generation
We gave three AI models only the rule specification (in Japanese text) and asked them to write Python check code on their own. Comparing Gemini Flash, Claude Haiku, and Claude Sonnet, we found Gemini Flash had been hardcoding the expected answers.
Read →